{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hackathon #3\n",
    "\n",
    "Written by Eleanor Quint\n",
    "\n",
    "Topics:\n",
    "- Subclassing `tf.module`\n",
    "- Saving and loading TensorFlow models\n",
    "- Running TensorFlow-based Python programs on Crane\n",
    "- Overfitting, regularization, and early stopping\n",
    "\n",
    "This is all setup in a IPython notebook so you can run any code you want to experiment with. Feel free to edit any cell, or add some to run your own code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We'll start with our library imports...\n",
    "from __future__ import print_function\n",
    "\n",
    "import numpy as np                 # to use numpy arrays\n",
    "import tensorflow as tf            # to specify and run computation graphs\n",
    "import tensorflow_datasets as tfds # to load training data\n",
    "import matplotlib.pyplot as plt    # to visualize data and draw plots\n",
    "from tqdm import tqdm              # to track progress of loops"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Subclassing `tf.module`\n",
    "\n",
    "The most flexible way to specify a model is by subclassing `tf.module`. This allows a model to be specified in a general way. The key function to implement is `__call__` which should take in data, run the model forward, and return the output. The function will frequently be decorated with `tf.function` (when not debugging the model), so that it can be compiled and run more quickly.\n",
    "\n",
    "An example implementing a dense layer (generally you should use `tf.keras.layers.Dense`):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Dense(tf.Module):\n",
    "    def __init__(self, output_size, activation=tf.nn.relu, name=None):\n",
    "        super(Dense, self).__init__(name=name) # remember this call to initialize the superclass\n",
    "        self.output_size = output_size\n",
    "        self.activation = activation\n",
    "        self.is_built = False\n",
    "\n",
    "    def build(self, x):\n",
    "        input_dim = x.shape[-1]\n",
    "        self.w = tf.Variable(\n",
    "          tf.random.normal([input_dim, self.output_size]), name='w')\n",
    "        self.b = tf.Variable(tf.zeros([self.output_size]), name='b')\n",
    "        self.is_built = True\n",
    "\n",
    "    @tf.function\n",
    "    def __call__(self, x):\n",
    "        if not self.is_built:\n",
    "            self.build(x)\n",
    "        y = tf.matmul(x, self.w) + self.b\n",
    "        return self.activation(y)\n",
    "\n",
    "# Create an instance of the layer\n",
    "dense_layer = Dense(10)\n",
    "# Call the model by passing the input to it\n",
    "dense_layer(tf.ones([32,3]))\n",
    "# We can get the variables of a Module for used calculating and applying gradients with `.trainable_variables`\n",
    "dense_layer.trainable_variables"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Saving and Loading TensorFlow models\n",
    "\n",
    "There are two main ways to save and load TF models: `tf.train.Checkpoint` and `tf.SavedModel`. First, we'll look at `tf.train.Checkpoint`. It's best used in the process of training rather than for serving models. This is because it only saves and loads the variables, and not the structure of the model. Thus, to use it, you must first instantiate the model from Python code and then load the variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# we have to pass in the model (and anything else we want to save) as a kwarg\n",
    "dense_layer = Dense(10)\n",
    "optimizer = tf.keras.optimizers.Adam()\n",
    "checkpoint = tf.train.Checkpoint(model=dense_layer, optimizer=optimizer)\n",
    "\n",
    "# Save a checkpoint to /tmp/training_checkpoints-{save_counter}. Every time\n",
    "# checkpoint.save is called, the save counter is increased.\n",
    "save_dir = checkpoint.save('/tmp/training_checkpoints')\n",
    "\n",
    "# Restore the checkpointed values to the `model` object.\n",
    "print(\"The save path is\", save_dir)\n",
    "status = checkpoint.restore(save_dir)\n",
    "# we can check that everything loaded correctly, this is silent if all is well\n",
    "status.assert_consumed()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The other way to save your model is with `tf.SavedModel`, which saves the variables and structure of the model. Specifically, it only saves methods which have been traced with `tf.function`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can manually trace a function that has been decorated with tf.function using get_concrete_function\n",
    "# We pass in the call signature. Here, None means that any number could fill in.\n",
    "# Typically we don't have to do this explicitly, unless we want the None in the first dimension (as in the homework)\n",
    "dense_layer = Dense(10)\n",
    "fn = dense_layer.__call__.get_concrete_function(\n",
    "    x=tf.TensorSpec([None, 3], tf.float32))\n",
    "\n",
    "# We can call the function we traced\n",
    "fn(tf.zeros([1,3]))\n",
    "print(fn(tf.ones([3,3])))\n",
    "\n",
    "tf.saved_model.save(dense_layer, '/tmp/saved_model')\n",
    "\n",
    "del dense_layer\n",
    "\n",
    "restored_dense = tf.saved_model.load('/tmp/saved_model')\n",
    "# this should be the same result as above\n",
    "print(restored_dense(tf.ones([3,3])))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Running TensorFlow-based Python programs on Crane\n",
    "\n",
    "#### 1. Get a shell on Crane\n",
    "To access a shell on Crane's login node, you can run `ssh <username>@crane.unl.edu` or visit `crane.unl.edu` in a browser. If you use the browser, login and then use the dropdown: `Clusters > Crane Shell Access`. Because this shell is on the login node, you shouldn't run your jobs directly. Instead, you can submit your job to a cluster node to be run with a GPU.\n",
    "\n",
    "#### 2. Get the slurm submit script and set up your anaconda environment\n",
    "Try this out by saving the following bash script as `submit_gpu.sh` and change `<your env name>` to the name of your anaconda environment. You should create the anaconda environment by following the [Crane docs](https://hcc.unl.edu/docs/applications/user_software/using_anaconda_package_manager/#creating-custom-gpu-anaconda-environment) replacing the first `module load` command with `module load tensorflow-gpu/py38/2.3`.\n",
    "\n",
    "```bash\n",
    "#!/bin/sh\n",
    "#SBATCH --time=6:00:00          # Maximum run time in hh:mm:ss\n",
    "#SBATCH --mem=16000             # Maximum memory required (in megabytes)\n",
    "#SBATCH --job-name=default_479  # Job name (to track progress)\n",
    "#SBATCH --partition=cse479      # Partition on which to run job\n",
    "#SBATCH --gres=gpu:1            # Don't change this, it requests a GPU\n",
    "\n",
    "module load anaconda\n",
    "conda activate <your env name>\n",
    "# This line runs everything that is put after \"sbatch submit_gpu.sh ...\"\n",
    "$@\n",
    "```\n",
    "\n",
    "#### 3. Submit a job\n",
    "Once you've got your script, you can run it like so to submit a job: `sbatch submit_gpu.sh python <filepath>.py`. This will submit a job to the `cse479` partition. Each student is only allowed one job at a time on this partition, and you may check the status of your jobs with `squeue -u <username>`. If you would like to submit more than one job (which we encourage generally), you can submit the extras to the `cse479_preempt` partition by substituting in the line `#SBATCH --partition=cse479_preempt`. You can submit an unlimited number of jobs to this queue, but they may be interrupted anytime another student needs a gpu on the `cse479` partition and there are no others available. Thus, we reccomend saving your model periodically so that you can restart training from where it was interrupted rather than having to start all over again.\n",
    "\n",
    "You can cancel jobs with `scancel <JOBID>`, where the job ID is the number associated with the job you can see in squeue or right after you submit the job, or with `scancel -u <username>` which cancels all your running jobs. For more details, please visit the [HCC docs](https://hcc-docs.unl.edu/), ask a question on Piazza, or come to office hours."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Overfitting, regularization, and early stopping\n",
    "\n",
    "If we have enough parameters in our model, and little enough data, after a long period of training we begin to experience overfitting. Empirically, this is when the loss value of data in training drops significantly below the loss value of the data set aside for testing. It means that the model is looking for patterns specific to the training data that won't generalize to future, unseen data. This is a problem.\n",
    "\n",
    "Solutions? Here are some first steps to think about:\n",
    "1. Get more data for the training set\n",
    "2. Reduce the number of model parameters\n",
    "3. Regularize the scale of the model parameters\n",
    "4. Regularize using dropout\n",
    "5. Early Stopping\n",
    "\n",
    "We'll go over how to do 3, 4, and 5 here.\n",
    "\n",
    "#### L2 Regularization\n",
    "We calculate l2 loss on the value of the weight matrix, so it's invariant to the input value. We'll add the regularization loss value to the total loss value so that it's included in the gradient update. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "L2_COEFF = 0.1 # Controls how strongly to use regularization\n",
    "\n",
    "class L2DenseNetwork(tf.Module):\n",
    "    def __init__(self, name=None):\n",
    "        super(L2DenseNetwork, self).__init__(name=name) # remember this call to initialize the superclass\n",
    "        self.dense_layer1 = tf.keras.layers.Dense(32, activation=tf.nn.relu)\n",
    "        self.dense_layer2 = tf.keras.layers.Dense(10)\n",
    "        \n",
    "    def l2_loss(self):\n",
    "        # Make sure the network has been called at least once to initialize the dense layer kernels\n",
    "        return tf.nn.l2_loss(self.dense_layer1.kernel) + tf.nn.l2_loss(self.dense_layer2.kernel)\n",
    "\n",
    "    @tf.function\n",
    "    def __call__(self, x):\n",
    "        embed = self.dense_layer1(x)\n",
    "        output = self.dense_layer2(embed)\n",
    "        return output\n",
    "    \n",
    "# Defining, creating and calling the network repeatedly will trigger a WARNING about re-tracing the function\n",
    "# So we'll check to see if the variable exists already\n",
    "if 'l2_dense_net' not in locals():\n",
    "    l2_dense_net = L2DenseNetwork()\n",
    "l2_dense_net(tf.ones([1, 100]))\n",
    "\n",
    "l2_loss = l2_dense_net.l2_loss()                     # calculate l2 regularization loss\n",
    "cross_entropy_loss = 0.                              # calculate the classification loss\n",
    "total_loss = cross_entropy_loss + L2_COEFF * l2_loss # and add to the total loss, then calculate gradients"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Dropout\n",
    "\n",
    "Let's re-specify the network with regularization from [dropout](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout). The Dropout layer randomly sets values to 0 with a frequency of the `rate` input (given when the layer is constructed). Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class DropoutDenseNetwork(tf.Module):\n",
    "    def __init__(self, name=None):\n",
    "        super(DropoutDenseNetwork, self).__init__(name=name) # remember this call to initialize the superclass\n",
    "        self.dense_layer1 = Dense(32)\n",
    "        self.dropout = tf.keras.layers.Dropout(0.2)\n",
    "        self.dense_layer2 = Dense(10, activation=tf.identity)\n",
    "\n",
    "    @tf.function\n",
    "    def __call__(self, x, is_training):\n",
    "        embed = self.dense_layer1(x)\n",
    "        propn_zero_before = tf.reduce_mean(tf.cast(tf.equal(embed, 0.), tf.float32))\n",
    "        embed = self.dropout(embed, is_training)\n",
    "        propn_zero_after = tf.reduce_mean(tf.cast(tf.equal(embed, 0.), tf.float32))\n",
    "        # Note that in a tf.function, we have to use tf.print to print the value of tensors\n",
    "        tf.print('Zeros before and after:', propn_zero_before, \"and\", propn_zero_after)\n",
    "        output = self.dense_layer2(embed)\n",
    "        return output\n",
    "\n",
    "# Defining, creating and calling the network repeatedly will trigger a WARNING about re-tracing the function\n",
    "# So we'll check to see if the variable exists already\n",
    "if 'drop_dense_net' not in locals():\n",
    "    drop_dense_net = DropoutDenseNetwork()\n",
    "drop_dense_net(tf.ones([1, 100]), tf.constant(True))\n",
    "print(\"Something to think about: why isn't the difference exactly equal to the proportion we passed to dropout?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Early Stopping\n",
    "\n",
    "Each gradient descent update takes only a small step, so we want to look at each input datum many times. How do we know when to stop though? We want to keep training until improvement stops, but because neural networks are non-linear, they might get worse before they get better, so we don't want to stop them after getting worse one time. We'll use the following code to do \"early stopping\" with patience. We pass in the validation loss (Note: make sure you use validation loss, not training loss. This is important) and `check` will tell us whether we should stop. It will return `True` after the loss hasn't improved for `patience` epochs.\n",
    "\n",
    "Although you might choose whether or not the last two regularizers are appropriate based on the problem, you should always use early stopping."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class EarlyStopping:\n",
    "    def __init__(self, patience=5, epsilon=1e-4):\n",
    "        \"\"\"\n",
    "        Args:\n",
    "            patience (int): how many epochs of not improving before stopping training\n",
    "            epsilon (float): minimum amount of improvement required to reset counter\n",
    "        \"\"\"\n",
    "        self.patience = patience\n",
    "        self.epsilon = epsilon\n",
    "        self.best_loss = float('inf')\n",
    "        self.epochs_waited = 0\n",
    "    \n",
    "    def __str__(self):\n",
    "        return \"Early stopping has waited {} epochs out of {} at loss {}\".format(self.epochs_waited, self.patience, self.best_loss)\n",
    "        \n",
    "    def check(self, loss):\n",
    "        \"\"\"\n",
    "        Call after each epoch to check whether training should halt\n",
    "        \n",
    "        Args:\n",
    "            loss (float): loss value from the most recent epoch of training\n",
    "            \n",
    "        Returns:\n",
    "            True if training should halt, False otherwise\n",
    "        \"\"\"\n",
    "        if loss < (self.best_loss - self.epsilon):\n",
    "            self.best_loss = loss\n",
    "            self.epochs_waited = 0\n",
    "            return False\n",
    "        else:\n",
    "            self.epochs_waited += 1\n",
    "            return self.epochs_waited > self.patience\n",
    "            \n",
    "early_stop_module = EarlyStopping()\n",
    "# pass in validation loss at each training epoch\n",
    "print(\"Training...\")\n",
    "print(\"Should we stop training?\", early_stop_module.check(1.4))\n",
    "print(early_stop_module)\n",
    "print(\"Training...\")\n",
    "print(\"Should we stop training?\", early_stop_module.check(2.3))\n",
    "print(early_stop_module)\n",
    "print(\"Training...\")\n",
    "print(\"Should we stop training?\", early_stop_module.check(1.2))\n",
    "print(early_stop_module)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Homework\n",
    "\n",
    "Your homework is to complete the following two tasks:\n",
    "1. Make sure you're comfortable submitting you're comfortable submitting jobs on Crane and saving models. Submit a job to `cse479` and put the job id from the submission in a text file along with;\n",
    "2. Think about the question posed above about dropout, \"why isn't the difference between the number of zeros before and after applying dropout exactly equal to the dropout proportion?\" Consider the network architecture and what operations were run before the dropout layer. Write a few sentences about this in the same text file as the previous question and submit to Canvas.\n",
    "\n",
    "I'm expecting this to take about an hour (or less if you're experienced). Feel free to use any code from this or previous hackathons. If you don't understand how to do any part of this or if it's taking you longer than that, please let me know in office hours or by email (both can be found on the syllabus). I'm also happy to discuss if you just want to ask more questions about anything in this notebook!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "dl_notebooks",
   "language": "python",
   "name": "dl_notebooks"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
